Add rendering functionals#1618
Conversation
Greptile SummaryThis PR adds a new
Important Files Changed
Reviews (1): Last reviewed commit: "Add rendering functionals" | Re-trigger Greptile |
| def _write_depth_tested_pixel( | ||
| x: int, | ||
| y: int, | ||
| z: wp.float32, | ||
| color: wp.vec4, | ||
| width: int, | ||
| height: int, | ||
| rgba: wp.array(dtype=wp.vec4), | ||
| depth: wp.array(dtype=wp.float32), | ||
| ): | ||
| if x >= 0 and x < width and y >= 0 and y < height: | ||
| index = y * width + x | ||
| old_depth = wp.atomic_min(depth, index, z) | ||
| if z <= old_depth: | ||
| rgba[index] = color |
There was a problem hiding this comment.
TOCTOU race between depth test and color write
wp.atomic_min establishes the winning depth value atomically, but the subsequent color write is unprotected. Two threads rendering overlapping wireframe segments at nearly identical depths can both satisfy z <= old_depth (both see the unmodified value before the other's atomic_min completes), and then both write to rgba[index] in an unordered manner. The color stored in the buffer will correspond to whichever thread wrote last, which need not match the thread that actually holds the minimum depth after both atomic_min calls settle. For the point cloud path the two-phase key/resolve scheme avoids this problem; the same pattern should be applied to wireframe pixels, or the atomic operation should cover the combined depth-color update.
| wp.init() | ||
| wp.config.quiet = True |
There was a problem hiding this comment.
wp.init() executed at module import time
Calling wp.init() at the top level of _warp_impl.py means it runs the moment any rendering submodule is imported — including users who only want the pure-PyTorch ScalarFieldToRGBA or VectorFieldToRGBA torch backends. In environments without CUDA (CI runners, CPU-only inference containers) this will either error out or trigger slow device enumeration. The initialization should be deferred to first use, e.g., inside the @torch.library.custom_op implementations or behind a lazy try/except guard.
| if forward_raw.device.type == "cpu" and bool( | ||
| (forward_raw.norm() <= 1.0e-12).item() | ||
| ): | ||
| raise ValueError("eye and center must not be equal") | ||
| forward = _normalize_torch(forward_raw) | ||
| up_hint = _normalize_torch(up) | ||
| right_raw = torch.linalg.cross(up_hint, forward, dim=0) | ||
| if right_raw.device.type == "cpu" and bool((right_raw.norm() <= 1.0e-12).item()): | ||
| raise ValueError("up must not be parallel to the camera direction") |
There was a problem hiding this comment.
Input validation skipped on CUDA devices
Both _camera_basis and _bounds_tensor guard their invariant checks with if forward_raw.device.type == "cpu", so on CUDA the checks are never performed. Passing eye == center (degenerate forward vector), up parallel to the view direction (degenerate right vector), or bounds_max <= bounds_min in any dimension will silently produce NaN or zero vectors that propagate through the kernel, yielding a black/corrupt render with no diagnostic. The same CPU-only pattern is repeated in _bounds_tensor (line 88). The validation should be unconditional, or the tensors should be moved to CPU just for the check.
| for step in range(8192): | ||
| if step > steps: | ||
| break | ||
| alpha = wp.float32(0.0) | ||
| if steps > 0: | ||
| alpha = wp.float32(step) / wp.float32(steps) | ||
| z = z0 * (1.0 - alpha) + z1 * alpha | ||
| for oy in range(-radius, radius + 1): | ||
| for ox in range(-radius, radius + 1): | ||
| _write_depth_tested_pixel( | ||
| x + ox, y + oy, z, color, width, height, rgba, depth | ||
| ) | ||
|
|
||
| if x == x1 and y == y1: | ||
| break | ||
| e2 = 2 * err | ||
| if e2 > -dy: | ||
| err -= dy | ||
| x += sx | ||
| if e2 < dx: | ||
| err += dx | ||
| y += sy |
There was a problem hiding this comment.
Hardcoded
range(8192) cap silently truncates long line segments
_draw_line_depth_tested loops for step in range(8192) and breaks when step > steps. For an edge whose projected screen-space extent spans more than 8192 pixels (e.g., a near-clipped edge at very high resolution or a very small FOV), the line will be silently cut off partway. The limit is invisible to callers and produces no warning. Consider deriving the cap from max(image_width, image_height) passed into the kernel, or at minimum documenting the constraint.
| s0 = _project_point(p0, camera, width, height, tan_half_fov, aspect) | ||
| s1 = _project_point(p1, camera, width, height, tan_half_fov, aspect) | ||
|
|
||
| if s0[2] <= near or s0[2] >= far or s1[2] <= near or s1[2] >= far: |
There was a problem hiding this comment.
Wireframe clips entire edge when any endpoint is outside the clip range
_wireframe_render_kernel returns early if either endpoint has s[2] <= near or s[2] >= far. A segment where one vertex is behind the camera and the other is in front will be silently dropped entirely rather than clipped. For scenes with edges that straddle the near plane this produces visually missing lines. The standard fix is to perform a parametric clip of the segment against the near/far planes before rasterizing.
|
/blossom-ci |
|
/blossom-ci |
1 similar comment
|
/blossom-ci |
|
/blossom-ci |
|
/blossom-ci |
|
/blossom-ci |
|
/blossom-ci |
|
/blossom-ci |
|
/blossom-ci |
PhysicsNeMo Pull Request
Description
Rendering Kernels